Interpreting Clusters of World Cup Tweets
نویسندگان
چکیده
Cluster analysis is a field of data analysis that extracts underlying patterns in data. One application of cluster analysis is in text-mining, the analysis of large collections of text to find similarities between documents. We used a collection of about 30,000 tweets extracted from Twitter just before the World Cup started. A common problem with real world text data is the presence of linguistic noise. In our case it would be extraneous tweets that are unrelated to dominant themes. To combat this problem, we created an algorithm that combined the DBSCAN algorithm and a consensus matrix. This way we are left with the tweets that are related to those dominant themes. We then used cluster analysis to find those topics that the tweets describe. We clustered the tweets using k-means, a commonly used clustering algorithm, and Non-Negative Matrix Factorization (NMF) and compared the results. The two algorithms gave similar results, but NMF proved to be faster and provided more easily interpreted results. We explored our results using two visualization tools, Gephi and Wordle.
منابع مشابه
A Case Study in Text Mining: Interpreting Twitter Data From World Cup Tweets
Cluster analysis is a field of data analysis that extracts underlying patterns in data. One application of cluster analysis is in text-mining, the analysis of large collections of text to find similarities between documents. We used a collection of about 30,000 tweets extracted from Twitter just before the World Cup started. A common problem with real world text data is the presence of linguist...
متن کاملText Analysis and Sentiment Polarity on FIFA World Cup 2014 Tweets
Social media has become one of the most popular communication tools for sharing opinions and everyday liferelated events. Twitter as a micro-blogging service can be used to discover events and news in real time from anywhere in the world. As Twitter posts (tweets) are short and are being generated constantly, they are well-suited sources of streaming data for opinion mining and sentiment polari...
متن کاملMLJ: Language-Independent Real-Time Search of Tweets Reported by Media Outlets and Journalists
In this demonstration, we introduce MLJ (MultiLingual Journalism, http://mljournalism.com), a first Web-based system that enables users to search any topic of latest tweets posted by media outlets and journalists beyond languages. Handling multilingual tweets in real time involves many technical challenges: language barrier, sparsity of words, and realtime data stream. To overcome the language ...
متن کاملRobust Control of Power System Stabilizer Using World Cup Optimization Algorithm
In this paper, we propose a new optimized PID controller to stabilize the synchronous machine connected to an infinite bus. The model for the synchronous machine is 4-ordered linear Philips-Heffron synchronous machine. In this research, the parameters of the PID controller are optimally achieved by minimizing a definite fitness function to removes the unstable Eigen-value to the left side of im...
متن کاملRobust Control of Power System Stabilizer Using World Cup Optimization Algorithm
In this paper, we propose a new optimized PID controller to stabilize the synchronous machine connected to an infinite bus. The model for the synchronous machine is 4-ordered linear Philips-Heffron synchronous machine. In this research, the parameters of the PID controller are optimally achieved by minimizing a definite fitness function to removes the unstable Eigen-value to the left side of im...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014